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Context-based and user-profile driven information retrieval. 



FIELD OF THE INVENTION 

The invention relates to a method and system for enabling retrieval of an 
information item from an information base in an electronic network. 

5 BACKGROUND ART 

Rapidly expanding information archives provide access to terabytes of 
electronic data, e.g., electronic museums, electronic newspapers, musical archives, digital 
libraries, software archives, mailing lists, up-to-date weather information and geographic data. 
Consequently, current advances in information technology are driven by the need to increase 
10 the effectiveness of information access and retrieval. 

Traditionally, information providers try to overcome the inadequacies of 
' information retrieval by providing fast and powerful search engines, see, for example, US 
patent 5,293,552 (PHN 13,666) herewith incorporated by reference. Retrieval mechanisms 
based on keywords typically return a large set of documents, but are not very precise in their 
1 5 return. Examples of searching systems are commonly available search engines, databases and 
library lookup systems. The user interacts with the system by providing a query with sufficient 
information and gets back a set of documents that more or less match the query. 

Traditional approaches have devised mechanisms to map a user's query to a 
document based on overlapping terms or concept words between the query and the document 
20 terms. 

One known approach is known from "Experiments on Using Semantic 
Distances Between Words in Image Caption Retrieval", Alan F. Smeaton and Ian Quigley, 
Proceedings of the 19th Annual International ACM SIGIR Conference on Research and 
Development in Information Retrieval, August 1996, Zurich, Switzerland. This approach uses 
25 a quantitative measure of semantic similarity between index terms for queries and documents. 

Another recent method is described in "A Deductive Data Model for Query 
Expansion", Kalervo Jarvelin, Jaana Kristensen, Timo Niemi, Eero Sormunen and Heikki 
Keskustalo, Proceedings of the 19th Annual International ACM SIGIR Conference on 
Research and Development in Information Retrieval, August 1996, Zurich, Switzerland, this 
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method introduces concept-based query expansion, where each concept is expanded to a 
disjunctive set of.concepts on the basis of conceptual relationships pointed out by the user. 

Yet another known idea is proposed in "Incremental Relevance Feedback for 
Information Filtering", James Allan, Proceedings of the 1 9th Annual International ACM 
5 SIGIR Conference on Research and Development in Information Retrieval. August 1 996, 
Zurich, Switzerland. This idea relates to relevance feedback techniques that process shifts in 
user interest patterns over a period of time. The user feeds back notions of which query results 
he/she believes are relevant to the current query. 

10 OBJECT OF THE INVENTION 

A key to effective information retrieval lies in mechanisms that increase the 
precision values for documents retrieved. One problem with existing search systems is that if 
the query is not very precise, the user is left with the task of scanning through a large of 
amount of result data to identify documents of interest, because a large percentage of the 

15 information retrieved is not relevant to the user. Another drawback is that the known retrieval 
methods supply a set of results that is restricted to the literal search criteria entered at that 
moment arid not much else. That is, the electronic information retrieval does not have the 
advantages of real-life browsing at a bookstore where an interesting book cover may catch a 
person's eye and divert his/her attention or awaken his/her interest. Consequently, the 

20 information provider is unable to guide the user to other, yet related, works that could be of 
interest to this particular user. 

It is therefore an object of the invention is to provide a method for retrieving 
information that improves the quality of the result data. 

25 SUMMARY OF THE INVENTION 

To this end, the invention provides a method of enabling a user to navigate 
through an electronic document base. The invention provides a method of enabling a user to 
query an electronic document base. The user supplies at least one query object, e,g., a word, a 
geometrical shape or partem, a tune or rhythm representing one or more bars of a piece music, 

30 etc. The method comprises determining a topical context for the query by means of extracting 
from an access history, e.g., at least one preceding query, of the user to the document base at 
least one concept object associated with the current query. The concept object is used to create 
at least part of a user profile. Then one or more documents are identified in the document base 
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under control of the user profile. The profile is updated based on the content of the identified 
document 

The invention increases the effectiveness of browsing wide-area information by 
means of focusing primarily on the user's interest as given by the user's access history in 

5 terms of the results of previous queries. Taking these results into account for next queries 
creates a context that enables interpreting the current query object in view of what currently is 
likely to be of interest to this specific user. The context for the current query is used to update 
the user's profile. The profile itself is used as a recommendation for mapping relevant 
information from the information provider's topic space, also referred to as document base, 

1 0 onto the user's search space. 

The profile gets updated dynamically in response to the user's interactions with 
the document base. Accordingly, the dynamic part reflects the path taken within the provider's 
information space in the course of the user's search. Preferably, the profile has also a static 
part that reflects the user's long-term interests. The term "static' is used to indicate a time 

1 5 scale substantially slower than that of the dynamic part. The static part is determined by, for 
example, letting the user provide topical information about his/her fields of attention the first 
time that the user interacts with the document base. Such entries can be changed manually in 
due course. Alternatively or subsidiarily, statistical analysis of a statistically relevant number 
of results over time enables finding themes that stay substantially constant. 

20 The preferred embodiment of the invention allows the user to retain a constant 

theme in his/her profile (static part) as well as to influence the profile by new issues (dynamic 
part) generated while browsing the provider's information space. This latter aspect of the 
invention gives a mechanism to information-providers to attract the user's interest while the 
latter is browsing at their sites. 

25 Preferably, the user is allowed to disable and enable the static and/or dynamic 

part of his/her profile so as to be able to choose whether or not to use the profiling in retrieving 
information. 

Thus, the invention enables clustering and re-clustering of the information 
space in a manner effective for highly personalized browsing. The invention can be regarded 
30 as an automatic version of the "refine" button as provided by various search engines found on 
the Internet. 

For the example with the music data base mentioned above, see U.S. patent 
application serial no. 08/840,356, filed April 28, 1997 (PHA 23,241), herein incorporated by 
reference. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

The invention is explained by way of example and with reference to the 
accompanying drawings, wherein Fig.l is a diagram illustrating the method of the invention; 

5 

PREFERRED EMBODIMENTS 

Fig. 1 is a diagram of a system 100 illustrating the method according to the 
invention by way of its main functionalities. System 100 has an electronic document base 102 
and a user terminal or client 104 through which the user interacts with document base 102. For 

10 example, client 104 comprises an alphanumeric keyboard or a speech coder (not shown) and a 
display device (not shown). The user enters, in this example, query words into system 100 
through the keyboard or speech coder and gets visual feedback on his/her entry and the query 
results as explained below. 

System 100 comprises a static profile memory 106 that stores indications of 

1 5 what represents this individual user's long-term interests. For example, the user has provided 
topics that represent her/his main fields of interest upon being introduced to system 100 for the 
first time. Alternatively, if, for example, the user's cultural and social background and 
profession are known, the system may assign by default this particular user to a particular 
category typical of this type of user. Alternatively, or subsidiarily, the user may specify that 

20 she/he is definitely NOT interested in specific topics so as to be able to exclude certain 

categories of documents right from the outset. All this information contributes to creating a 
long-term profile of this user which is stored in memory 106. 

It is assumed that the user is interacting with system 100 for the first time and 
enters a query word through client 104. System 100 now enables interpreting this query word 

25 within a certain context that is determined by the static profile as stored in memory 106. 
System 100 has a context generator 108 that generates one or more additional keywords 
associated with the topic under consideration as given by the user's entry. This is done, for 
example, via an algorithm that is based on a topical partitioning of the information space 
spanned by the documents in document base 102. Alternatively, the keyword entered through 

30 client 1 04 is mapped onto semantically similar terms in a dictionary. The mapping is 
controlled by static profile 106 to eliminate unrelated topics. For example, the entries 
"processor" and "micro" can be mapped onto the topic "computers" via "microprocessors", 
but also onto the topic "cooking" via "food processor" and "microwave oven". If the user is a 
rabid amateur cook with much too little time because she/he is a very busy specialist in 
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parallel data processing architectures, both topics may be relevant and the context should 
include both. If the static profile indicates that the user is only interested in one of these 
categories, the context should cause documents of the other category to be neglected. If the 
static profile comprises neither indication, the context should permit documents of both 
5 categories to be retrieved if present in document base 102. In order to achieve this selection, 
the keyword and one or more context keywords are entered into the search engine of document 
base 102. If static profile has a category "NOT", i.e., of one or more topics to be excluded in 
advance from the search, the search engine is caused to execute the Boolean operation so as to 
discard unwanted documents that happen to comply literally with the "NOT" conditions. 
10 It is assumed that document base 1 02 identifies a large number of documents 

that match the combination of the words entered by the user within the context generated by 
generator 108. The identifiers of these documents are returned to the user, for example in the 
format used by the PlanetSearch service of Philips Electronics at 

http://www.planetsearch.com/, whose search engine is described on U.S. patent 5,293,552. 

1 5 The results in this format are represented as ranked according to relevance, and the relative 
contribution of each keyword to each specific result is indicated by a colored bar. The results 
of this query are also sent to an analyzer 1 1 0. Analyzer 1 1 0 generates a set of concept 
keywords based on these results. The generation algorithm uses, for example, the topical 
partitioning of the information space of base 102 and a weighted topical dictionary. Such 

20 algorithms are known in the art. These concept keywords are then stored in a memory 1 12 that 
represents the user's dynamic profile. 

If the user starts a new query by entering one or more new key words, e.g., 
based on the results returned in the previous query, a similar procedure as outlined above is 
followed. The difference now is that the content of memory 1 12 is being taken into account as 

25 well in order to determine the context. The content of memory 1 1 2 thus indicates the path 
taken by the user while browsing the information space of document base 102. 

The user may change his/her focus of interest during his/her interaction with 
document base 102. If the user enters the next time one or more query words that relate to a 
topic that bears no relation to the context of the preceding query, system 100 detects a context 

30 shift. Context shifts are being monitored and are used to change the user's dynamic profile 1 1 2 
in order to modify the context of the previous queries. Upon a context shift, dynamic profile 
1 12 does initially not affect the query, as there are no concept words stored that relate to the 
new topic. 
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The above is illustrated by the following examples. Assume that the user has 
been interacting with system 100 using in succession the query words "dining", "recipes", 
"curry". The context derived from these entries is "cooking" or "food preparation". If the user 
now enters keywords "processor" and "micro", the dynamic part of the profile lets these terms 
5 be interpreted as "food processor" and "microwave oven", respectively, and identifies 
documents relating to the latter issues. Had the user been interacting with system 100 using, 
e.g., "parallel", "computing" and "algorithms", the same terms "processor" and "micro" would 
have been interpreted as "data processor" or "signal processor" and "microprocessor" within 
the context established by the dynamic profile as relating to "data processing" and 
10 "computers". 

As another example, assume that the user is initially interested in ideas on how 
to invest money. A user looking for a document on "investing" may have started off by 
entering via client 104 the keywords "investments" and "banking" into system 100. System 
100 processes these terms and retrieves documents that match this query. System 100 returns 

1 5 results to client 1 04 that represent the documents retrieved. The queries are then enhanced by 
adding and dropping a few keywords. The user browses through these results and gets 
attracted to the idea of on-line banking. In the next query the user adds a term "on-line" and 
queries system 100 anew. This leads towards articles about a bill pay system of a particular 
bank and the query is further enhanced by adding the term "pay bill". After arriving at a 

20 desired result, the user either quits the search, or shifts interest to another topic altogether, say, 
"computer networking architectures". This is referred to as a context shift: a shift in the query 
that indicates a change of interest. By understanding such context shifts it is possible to narrow 
the user's search path. For example, a search for "ATM" could imply either information 
regarding "asynchronous transmission mode networking protocol" or "Automated Teller 

25 Machines". Within the context of banking, the term "ATM" would have led to documents on 
"Automated Teller Machines". Within the context of •'networking architectures", the term 
ATM now leads to documents concerned with "asynchronous transmission mode". 

A context shift is detected using the generation algorithm mentioned above that 
uses a topical partitioning of the information space of document base 102 and a weighted 

30 topical dictionary. If, for example, the distance between a newly entered keyword and the 
keywords representing the current dynamic profile in memory 1 12 is too large, it is safe to 
assume that there is a context shift. This distance is obtained, for example, by computing a 
degree of overlap between successive query terms. The query terms used to compute the shift 
include the terms added on by analyzer 110. The larger the overlap, the higher the probability 




WO 99/67698 7 PCT/IB99/01089 

that the query takes place within the same context. If there is no overlap it is safe to assume 
that there is a context shift. When a context shift is detected, system 100 automatically maps 
the user's queries to another part of the topical information space. At the same time, system 
100 continues to build up an access history as the user now browses a different part of 
5 document base 1 02. 
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CLAIMS: 



1 . A method of enabling a user to query an electronic document base, the method 
comprising: 

- determining a topical context for the query, the determining comprising: 

- extracting from an access history of the user at least one concept object associated with the 
5 current query; and 

- using the concept object to create at least part of a user profile; 

- identifying one or more documents in the document base under control of the User profile; 
and 

- updating the access history based on the identified document 

10 

2. The method of claim 1, wherein the access history is formed by logging 
respective relationships between multiple objects indicative of respective queries and wherein: 

- the extracting comprises using the relationships for finding the concept object. 

15 3 The method of claim 1, wherein the updating comprises: 

- generating a further concept object based on the document identified; 

- storing the further concept object in the user profile. 

4. The method of claim 1, wherein the updating comprises: 
20 - generating a further concept object based on the document identified; 

- verifying if the further concept object is absent from the user profile; 

- storing the further concept object in the user profile if the further concept object was absent 
and 

- skipping the storing if the further concept object was present in the user profile. 

25 

5 . The method of claim 1 , comprising enabling the user to specify in advance at 
least another part of the user profile. 
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6. The method of claim 5, wherein the user is enabled to specify in the other part 

of the user profile.which specific ones of the documents identified are to be excluded from 
being made available to the user. 

5 7. A software agent for enabling a user to interact with a data base in an electronic 

system, the agent comprising: 

- a first profile part representing a user's field of interest for enabling forming a context within 
the data base; 

- a second profile part representing a user's context shift in the course of a browsing 
1 0 interaction for enabling dynamically modifying the context. 

8. A system enabling a user to query an electronic document base (1 02), the 

system comprising, 

- a generator (108) for generating a topical context for a query entered by the user, the 

15 generator being arranged to extract from an access history of the user at least one concept 

object associated with the current query and to use the concept object to create at least part of 
a user profile; 

- an analyzer (110) for under control of the user profile identifying one or more documents in 
the document base; and 

20 - updating means for on the basis of the identified document updating the access history of the 
user. 
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